Fully-Automatic Marker-based Chunking in 11 European Languages and Counts of the Number of Analogies between Chunks

نویسندگان

  • Kota Takeya
  • Yves Lepage
چکیده

Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Marker-based Chunking for Analogy-based Translation of Chunks

An example-based machine translation (EBMT) system based on analogies requires numerous analogies between linguistic units to work properly. Consequently, long sentences cannot be handled directly in such a framework. In this paper, we inspect the quality of translation of chunks obtained by marker-based chunking in English and French in both directions. Our results show that more than three qu...

متن کامل

A corpus study on the number of true proportional analogies between chunks in two typologically different languages

We measure the number of true proportional analogies between chunks in two typologically different languages on a similar corpus: a 20,000 sentence long Japanese-English bicorpus. We verify that at least 96% of analogies of form between chunks are also analogies of meaning. We conclude that analogy ought to be considered as a reliable structuring device between chunks.

متن کامل

Text Chunking using Transformation-Based Learning

Eric Brill introduced transformation-based learning and showed that it can do part-ofspeech tagging with fairly high accuracy. The same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new ta...

متن کامل

The Impact of Teaching Chunks on Speaking Fluency of Iranian EFL Learners

Research on multiword clusters (chunks) is based on the assumption that native speakers use plenty of chunks in their everyday language and they are considered as fluent speakers of language. Therefore the present study was an attempt to investigate the impact of using chunks on speaking fluency of Iranian EFL learners. In the first phase of the study, the students of two intermediate classes s...

متن کامل

A Text Chunker and Hybrid POS Tagger for Indian Languages

Part-of-Speech (POS) tagging can be described as a task of doing automatic annotation of syntactic categories for each word in a text document. This paper presents a generic hybrid POS tagger for Indian languages. Indian languages are relatively free word order, morphologically productive and agglutinative languages. In this hybrid implementation we have used combination of statistical approach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011